31 research outputs found

    Rolling Locomotion of Cable-Driven Soft Spherical Tensegrity Robots

    Get PDF
    Soft spherical tensegrity robots are novel steerable mobile robotic platforms that are compliant, lightweight, and robust. The geometry of these robots is suitable for rolling locomotion, and they achieve this motion by properly deforming their structures using carefully chosen actuation strategies. The objective of this work is to consolidate and add to our research to date on methods for realizing rolling locomotion of spherical tensegrity robots. To predict the deformation of tensegrity structures when their member forces are varied, we introduce a modified version of the dynamic relaxation technique and apply it to our tensegrity robots. In addition, we present two techniques to find desirable deformations and actuation strategies that would result in robust rolling locomotion of the robots. The first one relies on the greedy search that can quickly find solutions, and the second one uses a multigeneration Monte Carlo method that can find suboptimal solutions with a higher quality. The methods are illustrated and validated both in simulation and with our hardware robots, which show that our methods are viable means of realizing robust and steerable rolling locomotion of spherical tensegrity robots

    Multi Agent Reward Analysis for Learning in Noisy Domains

    Get PDF
    In many multi agent learning problems, it is difficult to determine, a priori, the agent reward structure that will lead to good performance. This problem is particularly pronounced in continuous, noisy domains ill-suited to simple table backup schemes commonly used in TD(lambda)/Q-learning. In this paper, we present a new reward evaluation method that allows the tradeoff between coordination among the agents and the difficulty of the learning problem each agent faces to be visualized. This method is independent of the learning algorithm and is only a function of the problem domain and the agents reward structure. We then use this reward efficiency visualization method to determine an effective reward without performing extensive simulations. We test this method in both a static and a dynamic multi-rover learning domain where the agents have continuous state spaces and where their actions are noisy (e.g., the agents movement decisions are not always carried out properly). Our results show that in the more difficult dynamic domain, the reward efficiency visualization method provides a two order of magnitude speedup in selecting a good reward. Most importantly it allows one to quickly create and verify rewards tailored to the observational limitations of the domain

    Controlling Tensegrity Robots through Evolution using Friction based Actuation

    Get PDF
    Traditional robotic structures have limitations in planetary exploration as their rigid structural joints are prone to damage in new and rough terrains. In contrast, robots based on tensegrity structures, composed of rods and tensile cables, offer a highly robust, lightweight, and energy efficient solution over traditional robots. In addition tensegrity robots can be highly configurable by rearranging their topology of rods, cables and motors. However, these highly configurable tensegrity robots pose a significant challenge for locomotion due to their complexity. This study investigates a control pattern for successful locomotion in tensegrity robots through an evolutionary algorithm. A twelve-rod hardware model is rapidly prototyped to utilize a new actuation method based on friction. A web-based physics simulation is created to model the twelve-rod tensegrity ball structure. Square-waves are used as control policies for the actuators of the tensegrity structure. Monte Carlo trials are run to find the most successful number of amplitudes for the square-wave control policy. From the results, an evolutionary algorithm is implemented to find the most optimized solution for locomotion of the twelve-rod tensegrity structure. The software pattern coupled with the new friction based actuation method can serve as the basis for highly efficient tensegrity robots in space exploration

    Improving Trust in Deep Neural Networks with Nearest Neighbors

    Get PDF
    Deep neural networks are used increasingly for perception and decision-making in UAVs. For example, they can be used to recognize objects from images and decide what actions the vehicle should take. While deep neural networks can perform very well at complex tasks, their decisions may be unintuitive to a human operator. When a human disagrees with a neural network prediction, due to the black box nature of deep neural networks, it can be unclear whether the system knows something the human does not or whether the system is malfunctioning. This uncertainty is problematic when it comes to ensuring safety. As a result, it is important to develop technologies for explaining neural network decisions for trust and safety. This paper explores a modification to the deep neural network classification layer to produce both a predicted label and an explanation to support its prediction. Specifically, at test time, we replace the final output layer of the neural network classifier by a k-nearest neighbor classifier. The nearest neighbor classifier produces 1) a predicted label through voting and 2) the nearest neighbors involved in the prediction, which represent the most similar examples from the training dataset. Because prediction and explanation are derived from the same underlying process, this approach guarantees that the explanations are always relevant to the predictions. We demonstrate the approach on a convolutional neural network for a UAV image classification task. We perform experiments using a forest trail image dataset and show empirically that the hybrid classifier can produce intuitive explanations without loss of predictive performance compared to the original neural network. We also show how the approach can be used to help identify potential issues in the network and training process

    Quicker Q-Learning in Multi-Agent Systems

    Get PDF
    Multi-agent learning in Markov Decisions Problems is challenging because of the presence ot two credit assignment problems: 1) How to credit an action taken at time step t for rewards received at t' greater than t; and 2) How to credit an action taken by agent i considering the system reward is a function of the actions of all the agents. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning OK TD(lambda) The second credit assi,onment problem is typically addressed either by hand-crafting reward functions that assign proper credit to an agent, or by making certain independence assumptions about an agent's state-space and reward function. To address both credit assignment problems simultaneously, we propose the Q Updates with Immediate Counterfactual Rewards-learning (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. Instead of assuming that an agent s value function can be made independent of other agents, this method suppresses the impact of other agents using counterfactual rewards. Results on multi-agent grid-world problems over multiple topologies show that QUICR-learning can achieve up to thirty fold improvements in performance over both conventional and local Q-learning in the largest tested systems

    QUICR-learning for Multi-Agent Coordination

    Get PDF
    Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assignment problem is typically addressed with temporal difference methods such as Q-learning. The second credit assignment problem is typically addressed by creating custom reward functions. To address both credit assignment problems simultaneously, we propose the "Q Updates with Immediate Counterfactual Rewards-learning" (QUICR-learning) designed to improve both the convergence properties and performance of Q-learning in large multi-agent problems. QUICR-learning is based on previous work on single-time-step counterfactual rewards described by the collectives framework. Results on a traffic congestion problem shows that QUICR-learning is significantly better than a Q-learner using collectives-based (single-time-step counterfactual) rewards. In addition QUICR-learning provides significant gains over conventional and local Q-learning. Additional results on a multi-agent grid-world problem show that the improvements due to QUICR-learning are not domain specific and can provide up to a ten fold increase in performance over existing methods

    Super Ball Bot - Structures for Planetary Landing and Exploration

    Get PDF
    Small, light-weight and low-cost missions will become increasingly important to NASA's exploration goals for our solar system. Ideally teams of dozens or even hundreds of small, collapsable robots, weighing only a few kilograms a piece, will be conveniently packed during launch and would reliably separate and unpack at their destination. Such teams will allow rapid, reliable in-situ exploration of hazardous destination such as Titan, where imprecise terrain knowledge and unstable precipitation cycles make single-robot exploration problematic. Unfortunately landing many lightweight conventional robots is difficult with conventional technology. Current robot designs are delicate, requiring combinations of devices such as parachutes, retrorockets and impact balloons to minimize impact forces and to place a robot in a proper orientation. Instead we propose to develop a radically different robot based on a "tensegrity" built purely upon tensile and compression elements. These robots can be light-weight, absorb strong impacts, are redundant against single-point failures, can recover from different landing orientations and are easy to collapse and uncollapse. We believe tensegrity robot technology can play a critical role in future planetary exploration

    Learning sequences of actions in collectives of autonomous agents

    Get PDF
    corecore